2025.12.02 | 代码智能四步落地;LongVT长视频精准理解
Description
本期的 15 篇论文如下:
[00:20 ] 🧠 From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence(从代码基础模型到智能体与应用:代码智能实用指南)
[01:05 ] 🎬 LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling(LongVT:通过原生工具调用激励“长视频思考”)
[01:43 ] 🔍 Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights(Envision:面向因果世界过程洞察的统一理解与生成基准)
[02:21 ] 🧠 Stabilizing Reinforcement Learning with LLMs: Formulation and Practices(利用大语言模型稳定强化学习的公式与实践)
[02:59 ] 🔍 How Far Are We from Genuinely Useful Deep Research Agents?(我们距离真正有用的深度研究智能体还有多远?)
[03:47 ] ⚖ What about gravity in video generation? Post-Training Newton's Laws with Verifiable Rewards(视频生成中的重力考量?基于可验证奖励的后训练牛顿定律应用)
[04:24 ] 🔍 The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment(一致性批评家:通过参考引导的注意力对齐纠正生成图像中的不一致性)
[05:14 ] 🎬 Infinity-RoPE: Action-Controllable Infinite Video Generation Emerges From Autoregressive Self-Rollout(Infinity-RoPE:从自回归自展开中涌现的可控动作无限视频生成)
[05:58 ] 🔗 TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models(TUNA:驯服统一视觉表征以构建原生统一多模态模型)
[06:41 ] 🧠 Rectifying LLM Thought from Lens of Optimization(从优化视角修正大语言模型的思维过程)
[07:16 ] ⚡ Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning(Flash-DMD:基于高效蒸馏与联合强化学习实现高保真少步图像生成)
[07:54 ] 🚀 LFM2 Technical Report(LFM2 技术报告)
[08:31 ] 🤖 GR-RL: Going Dexterous and Precise for Long-Horizon Robotic Manipulation(GR-RL:迈向灵巧与精准的长时程机器人操作)
[09:09 ] 🎬 InternVideo-Next: Towards General Video Foundation Models without Video-Text Supervision(InternVideo-Next:迈向无需视频文本监督的通用视频基础模型)
[09:44 ] ⚡ VLASH: Real-Time VLAs via Future-State-Aware Asynchronous Inference(VLASH:基于未来状态感知的异步推理实现实时视觉-语言-动作模型)
<figure>
</figure>【关注我们】
您还可以在以下平台找到我们,获得播客内容以外更多信息
小红书: AI速递





